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O , Abstract 

fj . The evaluation of long-range potentials in periodic, many-body systems arises as a necessary step in the numer- 

•i-H ' ical modeling of a multitude of interesting physical problems. Direct evaluation of these potentials requires 0{N^) 

K^^^, operations and 0{N^) storage, where A'^ is the number of interacting bodies. In this work, we present a method, 

(-H ' which requires 0{N) operations and 0{N) storage, for the evaluation of periodic Helmholtz, Coulomb, and Yukawa 

Oj' potentials with periodicity in 1-, 2-, and 3-dimensions, using the method of Accelerated Cartesian Expansions (ACE). 

We present all aspects necessary to effect this acceleration within the framework of ACE including the necessary 
translation operators, and appropriately modifying the hierarchical computational algorithm. We also present several 
results that validate the efficacy of this method with respect to both error convergence and cost scaling, and derive 
error bounds for one exemplary potential. 
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1. Introduction 



^\ ' The evaluation of potential functions in many-body systems subject to periodic or quasi-periodic bound- 

ary conditions is a computationally demanding task that arises frequently in the numerical modeling of 
physical systems. Among the many contexts in which such calculations arise are the analysis of electromag- 
netic wave propagation in photonic bandgap structures [1,2], frequency selective structures [3], cosmological 
structure formation [4], defects in the solid state [5,6], etc. Numerical methods specific to the solution of 
these types of problems require the repeated evaluation of periodic potentials, whether in the application of 
an iterative solver, or in the step-by-step updating of energies and force fields in a time integration scheme. It 
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is well-known that direct approaches to the evaluation of periodic potentials require 0{N'^) operations and 
0{N'^) storage, where N is the number of unknown quantities in a single unit cell of the periodic structure. 
Consequently, there is a need for the development of fast methods that mitigate this quadratic scaling. 

While this work is concerned with such methods for periodic/quasi-periodic potentials, these algorithms 
have been researched extensively for non-periodic potentials, and we first discuss the more general aspects of 
fast methods for arbitrary pairwise potentials. The history of such methods spans more than four decades, 
and during this period a number of distinct algorithms have been developed. They can be broadly categorized 
by whether acceleration is achieved through a hierarchical decomposition of the domain, as in tree codes 
[7,8] and fast multipole methods (FMMs) [9,10], or the discretization and resolution of the potential across 
multiple scales, as in particle-mesh [11] and multigrid methods [12]. Algorithms from the latter category 
predate hierarchical methods by more than a decade, and have matured considerably with time. The basic 
premise of particle-mesh/multigrid methods is the evaluation of the potential based upon data defined 
on a hierarchy of discretization scales. A global solution is generated rapidly at the coarsest scale, and 
local corrections are then generated and applied based upon data at the finest scale. Perhaps the oldest of 
these methods is particle-particle/particle- mesh (P'^M), first published in 1973 in the context of molecular 
dynamics simulations [13]. The basis of P'^M is the interpolation of point sources onto a mesh at a coarser 
scale (particle- mesh), from which a continuum source distribution can be defined, at which point the potential 
is computed using an FFT-based PDE solver for the associated continuum problem. While this reproduces 
the potential accurately due to long-range interactions, information about short-range interactions is lost in 
interpolation, and corrections are subsequently applied by way of the direct (particle-particle) evaluation of 
the potential due to point sources which are in close spatial proximity. Some other, closely related methods 
include particle-mesh Ewald (PME), smooth PME, and multigrid methods. Typically, these methods require 
0{N log N) overhead in terms of both number of operations and storage. An extensive list of references 
concerning mesh-based methods can be found in [14]. 

In spite of their frequent conflation in the literature, tree codes and FMMs represent two distinct ap- 
proaches to the efhcient evaluation of pairwise potentials that are best explained in [15]. The first tree codes, 
due to Barnes and Hut [7], were based on the observation that the gravitational/Coulombic potential eval- 
uated at a point due to sources far away can be accurately represented in terms of a truncated multipole 
expansion of the source distribution. More generally, in tree codes the interaction between a source-observer 
pair is computed using one of three options: (i) directly, (ii) at each observation point using the multipole 
expansion due to a cluster of sources, or (iii) using local expansions at a cluster of observers. The decision 



as to which operation is to be used is made to guarantee computational efhciency. Given the generahty of 
presentation, it is worth noting that tree codes have been developed for a variety of potentials [8,16]. FMMs 
take this principle one step further, by introducing aggregation and disaggregation operators [9], that permit 
the computation of potentials in a completely optimal manner [15]. In general, these methods lead to a cost 
complexity with 0{N log" N) scaling, where a € [0, 1] depends upon the distribution of unknowns. Among 
the principle advantages of tree/fast multipole methods is that they have mathematically rigorous bounds 
on error, which are often lacking in mesh-based methods. 

1.1. Accelerated Cartesian Expansions - A brief introduction 

The method of Accelerated Cartesian Expansions (ACE) is a tree-based method similar in spirit to FMMs, 
in so far as it includes aggregation and disaggregation operators. Classical FMMs rely on constructing a 
representation of the Green's function in terms of special functions; for instance, the Coulombic Green's 
function is represented in terms of spherical harmonics. The ACE algorithm, on the other hand, is contingent 
upon constructing a representation in terms of a generalized Taylor expansion that is expressed in terms 
of totally symmetric tensors which are used to reduce the overall cost relative to other Cartesian methods. 
In 2007, ACE was introduced for the evaluation of potentials of the form r^"^ [17]. While it is based on 
Taylor series expansions, it was shown that it is possible to develop exact aggregation and disaggregation 
operators. More interestingly, using the well-known equivalence between traceless Cartesian tensors and 
Legendre polynomials, it was shown that it is possible to develop relationships, both in terms of cost and 
operations, between ACE and classical FMM. As ACE is not wedded to addition theorems for special 
functions, it is possible to apply these to the rapid evaluation of many different potentials. To date, this 
has been done for potentials of the form r^" {i^ E R) [17], Lienard-Wiechert potentials [18], and diffusion, 
Klein-Gordon, and lossy wave potentials [19]. Likewise, ACE has also been implemented together with FMM 
for the wideband analysis of electromagnetic phenomena [20] , with analytically derived error bounds that 
have been demonstrated via numerical experimentation. 

1.2. Earlier Work in Periodic Tree-Based Methods 

While tree-based methods (including ACE) have been studied extensively in the context of non-periodic 
problems [21,22], their adaptation to periodic problems is encountered less frequently in the available liter- 
ature. We attribute this to two factors, (i) periodic boundary conditions are often employed in situations in 
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which they are already being used to decrease the effort required for a particular calculation, and (ii) a diffi- 
culty in constructing and evaluating the necessary translation operators. This first factor essentially implies 
that tree-based methods will only be useful for periodic problems in which the unit cell is either very large, 
or very densely discretized. This is, however, problem-dependent, and a number of interesting applications 
exist in which these requirements are met. The latter factor stems from the nature of the periodic Green's 
function for long-range interactions, namely that it is typically some manner of infinite sum, in which case 
the translation operator will not only be difficult to derive, but might require significant computational 
overhead. 

Early work in adapting tree-based methods to periodic systems were focused on Coulombic systems. In 
their seminal paper on the FMM [9], Greengard and Rokhlin implement periodic boundary conditions, as 
well as Dirichlet and Neumann, for the two-dimensional Coulomb potential. Schmidt and Lee later extended 
this approach by incorporating rapidly convergent Ewald summations [10,23]. Challacombe, et. al. [24] 
published results based upon the efficient and accurate evaluation of lattice sums of spherical harmonics, 
enabling improvements in both the computational cost and memory overhead, relative to extant methods of 
the time. One particularly interesting extension due to Lambert, et al [25] utilizes the hierarchical structure 
of the FMM to accelerate the aggregation of multipole expansions of periodic image cells, avoiding the direct 
evaluation of lattice sums. Similar methods for the periodic Helmholtz kernel have seen more punctuated 
development. While Rokhlin and Wandzura have presented work applying the FMM to periodic Helmholtz 
problems [26], the extent of this work was limited to the rapid calculation of matrix elements arising in 
the solution of electromagnetic integral equations, rather than the calculation of the potential itself (i.e. 
the associated matrix- vector product). The FMM was not successfully employed in the iterative solution of 
electromagnetic integral equations until over a decade later in the work of Otani and Nishimura [1]. 

The majority of the previously referenced tree-based methods incorporate periodic boundary conditions 
only at the top of the tree, corresponding to a multipole expansion of the entire unit cell. The local expansion 
due to the infiuence of the rest of the lattice, excepting the nearfield of the unit cell, is evaluated using lattice 
sums. At all levels below the top, free space translation operators are used, taking into account not only the 
effect of boxes lying inside of the unit cell, but those in its nearfield as well. In this paper, we follow a more 
conventional approach, viz., use the addition theorem for the full periodic Green's function for multipole-to- 
local translations. This is very similar in spirit to what has been done using interpolatory methods [27-29], 
and in fast time domain methods [30] in electromagnetics. As a consequence, all such translations will be 
restricted to the interior of a single unit cell, reducing the total number of operations required per tree 



traversal relative to these other methods, at the expense of requiring more complex translation operators. 
Given differences between test architectures and implementation, it is difficult to draw direct comparisons 
between timings for our method and others available in the literature. However, we provide extensive results 
in Section 4 that demonstrate an exceptional acceleration relative to direct evaluation, and breakeven points 
that are clearly very competitive with extant methods. 

f.3. Outline of Contents 

In this paper, we demonstrate the extension of the ACE algorithm to a wide array of periodic potentials, 
from Coulomb to Yukawa to Helmholtz. In doing so, we discuss the algorithmic changes required, namely 
the derivation and evaluation of periodic translation operators and how interaction lists are constructed. We 
do not seek to tie this work to the solution of any particular problem (i.e. integral equation solvers, A^-body 
dynamics, etc), although we note that we have adapted our method to the solution of integral equations 
which arise in the analysis of electromagnetic wave propagation, and have submitted it to a more appropriate 
forum [31]. The principal contributions of this paper are as follows: 

(i) Derivation of the necessary translation operators for periodic Helmhholtz, Yukawa, and Coulomb 
potentials on physically relevant lattices (singly, doubly, and triply periodic) 

(ii) Algorithmic changes for constructing ACE interaction lists on periodic domains 

(iii) Error bounds on the associated expansions 

In Section 2, we provide mathematical details concerning the class of problems we aim to solve. Our approach 
is sufficiently general that we can succinctly present details for periodic Coulomb, Yukawa, and Helmholtz 
potentials for singly, doubly, and triply periodic lattices. In Section 3, the ACE algorithm is reviewed, and 
details of its implementation for periodic domains are provided. Finally, in Section 4, error convergence and 
scaling are demonstrated. Details concerning periodic Green's functions, derivations of the necessary ACE 
translation operators, and the associated error bounds are given in the Appendices. This is done to improve 
the overall readability of the manuscript. 

2. Mathematical Framework 

2.1. Statement of the Problem 

Consider a domain, il C M^, containing a source distribution, /o(r), that gives rise to an unknown potential, 
ip{r), governed by one of the following partial differential equations (PDEs): 

5 



(V^ + K^) ip{r) = -p{r) (Hclmholtz Equation) (la) 

(V^ - K^) tp{r) = -p{r) (Yukawa Equation) (lb) 

V^V'(r) = — p(r) (Poisson Equation) (Ic) 

Here, k G M+, is a problem-dependent constant, which corresponds to the wavenumber for the Helniholtz 
equation, and an inverse screening length for the Yukawa equation. Assume that p(r) is periodic with respect 
to a /i-dimensional lattice, >C^, defined as: 

£^ := < t(n^) = ^n^ajln^ e Z I (2) 

Here, n^ is used as a short-hand notation for the /i-tuple (ni,...,np), and a^ arc the primitive vectors 
associated with some Bravais lattice. For the sake of simplicity, we will consider a simple orthorhombic 
lattice. One can define the associated reciprocal lattice, £* : 

L; := i k(n^) = J2 "«b.|n, e Z I (3) 

Here, b^ are the primitive reciprocal lattice vectors, which together with the primitive lattice vectors satisfy 
SLi (X) hj — 2TT6ij, where 6ij is the Kronecker tensor. 

Using these definitions, we can describe the periodicity of p(r) as follows: 

p(r + t(n^))=p(r):t(n^)e£^ (4) 

Given this constraint on p(r), we can fully characterize it over a reduced volume, namely the primitive 
cell, flfj^ C il, the minimal set which reflects the translational symmetry of the lattice. We can completely 
reconstruct 51 from a union of primitive cells shifted by all lattice vectors in >C^: 

n = {Q^+t{n^)\t{n^)eC^} (5) 

Here, we refer to the set fl^ + t(n^) = fin^ as the n^th image cell, where r2(o....o) = ^fj. is the central 
primitive cell. We denote the measure of space occupied by a single primitive cell in the subspace of the 
lattice as Afj, (i.e. Ai ^length of primitive cell, etc). Fig. (1) provides a pictorial representation of 57 for /i = 2. 

By the translational invariancc of the PDEs in (1), that p(r) is periodic is sufficient to guarantee the 
periodicity of V'(r), as well: 



V'(r)=^(r + t(n^)):tK)e£^ (6) 

While this periodicity wiU suffiee to constrain aU boundary conditions when the codimension of the lattice 
is zero, for situations in which it is non-zero we must consider the 3 — /i unspecified boundaries. We consider 
these boundaries to behave as unbounded space, as this is frequently the physically relevant choice for a 
number of modeling scenarios. In other words, for the Yukawa and Coulomb potentials, we consider the 
situation in which the potential decays as we recede away from the lattice, and for the Helmholtz potential, 
we consider a Sommcrfeld boundary condition. For the Helmholtz equation, we may also be interested in 
quasi-periodic boundary conditions, wherein a phase factor due to a non-zero Floquet wavenumbcr will arise 
as each cell is traversed. However, in the interest of maintaining a unified approach to the three potentials, 
we relegate a full discussion of this to Appendix E. 

Having fully specified the problem, i.e., PDE and boundary conditions, we seek solutions for '>p{r). Given 
the periodic boundary conditions, we can determine our potential solution completely in terms of p(r) for 
re n^: 

^v)= j dv'G,{\v-v'\)p{v') (7) 

Of, 

Here, G^(|r — r'|) is the appropriate periodic Green's function; a more complete and rigorous discussion of 
its derivation and evaluation can be found in Appendices A and B. Given its rapid and absolute convergence 
in and away from C^, we utilize the Ewald representation of the periodic Green's function [32-38]: 

G^(|r-r'|)= Y.Er{\v-v' + t{n,)\)+ ^ £fe(r - r',kK)) (8) 

t(n^) k(n^) 

Following the usual convention, the first summation will be referred to as the 'real sum' and the second as 
the 'reciprocal sum'. The functional form of the terms in the real sum depend upon the type of potential 
being evaluated: 

g±iK|r-r'+t(nf,)| / ^ 



£,(|r-r'+tK)|)=^^^— — ^^^^-^er/c(^r,|r-r' + tK)|±*— j (Helmholtz) (9a) 



g±K|r-r'+t(n^)| , ^ 



erfc (ryjr — r' + t(n^)|) (Poisson) (9c) 



47r|r-r'-ht(n^)| 

Here, erfc is the complimentary error function [39], and r/ G ffi.^ is deemed the splitting parameter. The 
form of the terms in the reciprocal sum depend upon both ji and the type of potential, by way of a function 



a(n 



iJ-> 






8,{v r', k(n,)) ^ -^-j- Y: '-^i\rt\v?'E,^, ( 4;^ ) (a^ = 1) (10a) 

^^0 



(m - 3) (10c) 



^3Q;2(n3) 

Here, En is the exponential integral of nth order [39], and r; and ri are projections of r — r' which are 
parallel and transverse to the span of the lattice vectors, respectively, and a{n^) is defined as follows for 
the potentials of interest: 



a(n^) == J|k(n^)r - k^ (Helmholtz) (11a) 



a(n^) = ^|k(n^)|" + n^ (Yukawa) (lib) 

a(n^) = |k(n^)| (Poisson) (lie) 

It is evident that this form of the Green's function possesses some singularities that will complicate the eval- 
uation of Eqn. (7), namely when |r — r'| — > or a(n^) — > 0. Unless otherwise indicated, we will implicitly 
exclude these singular contributions to the potential, and leave a more complete discussion of their proper 
treatment to Appendix A. 

Having specified G^(|r — r'|), given p(r), ■0(r) can be calculated for r e 17^ using Eqn. (7), furnishing 
a solution to the PDEs in Eqn. (1) subject to appropriate boundary conditions. The primary focus of this 
paper will be the rapid evaluation of a discrete form of the convolution in (7) using the ACE algorithm. In 
what follows, we will briefiy discuss this discretization, and then move onto the details of the ACE algorithm. 

2.2. Discretization of the Problem 

In discretizing Eqn. (7), without loss of generality, we consider the case where p(r) can be expressed as 
N point sources distributed throughout fl^ : 

N 

p{v) = Y,qp5{v-vp) (12) 

/3=1 

Here, r^ € fi^ is the location of the /?th discrete source, and g^ is its associated weight. We can now write 
V'(r) as: 



N 

^(r) = ^q^G^(|r-r^|) (13) 

/3=1 

Without loss of generality, we consider the calculation of the mutual interaction between these sources, viz. 
Eqn. (7) evaluated at all source points. We can reformulate this calculation as a matrix equation as follows: 

ip{r^) = Y,<l(sG^,{\r^~rp\) -^ '^a = {I - So^p) G^{\ro, - vpDQp (14a) 

*a = G^^Q^ (14b) 

Here, G is an A^ x A^ matrix in which the self-interaction terms are explicitly excluded and Q and ^ are A^ x 1 
column vectors with entries Q^ = qp and $„ — 'ip{ra). From this form of the potential, it is evident that 
the calculation of the potential will require 0{N'^) operations and 0{N'^) storage using direct methods. It is 
the goal of this work to demonstrate that using the ACE algorithm, the total cost of potential computation 
is reduced to 0{N) in time and 0{N) in storage. 

Thus far, we have presented the convolution in Eqns. (7) and (14b) as the solution to a particular set 
of PDEs. It is important to note that this does not limit the scope of this work to problems in which the 
explicit form of p(r) in fl^ is known a priori. In the integral equation formulation of numerous problems 
in applied math/physics, p(r) (discretely, Q) is an unknown to be resolved, and V'(r) (^) is known. In this 
context, iterative methods for the solution of Eqn. (14b) will require the repeated evaluation of matrix- vector 
multiplication with G. To this end the utility of fast potential evaluators, including ACE, is well-documented 
for a multitude of problems [1,2,18,20,22,40-42]. 

3. Rapid Evaluation of Periodic Potentials 

3.1. Description of the Algorithm 

ACE and other FMM-typc methods achieve an 0{N) cost in timing and storage by approximating Eqn. 
(14b) as follows: 

*a = G„^Q^ « GX'Q;3 + /:^^''(Q^) (15) 

Here, G^o""" is a sparse matrix, carrying only entries of Gap describing interactions between source-observer 
pairs which are in some metric, 'near', and C is some composition of linear operators which approximates 
the interactions between the remaining 'far' source-observer pairs. This operator will be defined more explic- 
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itly in Section 3.1.4. The ACE algorithm describes the metric that demarcates 'near' and 'far' interactions 
based upon a hierarchical decomposition of 17^, and then provides rules for performing both operations in 
Eqn. (15) in a manner that requires 0(N) operations, and 0{N) storage. 

Such a hierarchical decomposition is achieved by mapping all source and observer points onto a regular 
octree data structure, henceforth referred to as 'the tree'. This structure provides a natural metric by which 
'near' and 'far' interactions can be separated, as well as a means by which the application of C can be 

mapped onto the traversal of the tree. Using these notions, the ACE algorithm can be used to evaluate Eqn. 
(15) in the following steps: 

Algorithm 1 ACE Algorithm 



Construct the tree based upon discretization of p{r) and V'(r) in O^. 
Fill nearfield/farfield interaction lists based upon tree. 
Precompute nearfield matrix elements and ACE translation operators. 
Compute ^Q via sparse nearfield matrix multiplication and tree traversal. 



Steps 1-3 constitute pre-processing, which need be performed only once for a fixed source distribution. 
Step 4 is the only recurrent step required for the repeated evaluation of a potential in which the entries of 
Q vary. In what follows, we describe each step in detail and provide mathematical substantiation in Section 
(3.1.4). 

3.1.1. Constructing the Tree 

Tree construction is based upon the specification of the desired number of levels Ni , chosen to optimize 
the cost and/or error. The primitive cell, fl^ is recursively subdivided into boxes of equal volume A^; — 1 
times. A single level of the tree is defined by a set of boxes of equal volume, with 1 being the level with 
the smallest ('leaf') boxes. At a given level, a box subordinate to a larger box at the level above is deemed 
the child to the larger box's parent. Every box is assigned an address in octal, based upon the usual octree 
decomposition, allowing us to readily acquire the 'genealogy' of a particular box given its address alone. 
This method of addressing boxes is called Morton ordering or Z-space filling curves [43] . Each box at the leaf 
level carries a list of the point sources/observers lying inside its boundaries. Given the hierarchical structure 
of the tree, it is trivial to determine the point sources/observers subordinate to boxes at lower levels by 
recursively aggregating children until the leaf level is reached. 



10 



As will be explicitly discussed in the next section, the construction of the interaction lists requires the 
consideration of not just boxes that lie inside of il^, but their nearest images as well. This is done to properly 
catalog near and far interactions, as is elucidated in the next section. The images of the cell are included by 
adding two fictitious levels together with the tree hierarchy that represents the nearest images of f7^ . Figure 
2 illustrates this modification to the overall structure. Note, these image boxes will not store any sources or 
observers, and are not involved in tree traversal. Instead, they only serve as placeholders in addressing boxes, 
incurring a negligible computational overhead, and we do not consider their contribution to the height of 
the tree when it is referenced. As the addressing scheme of a regular octree follows Morton ordering, boxes 
inside the primitive cell will fall within a contiguous address space [43]. This provides a simple means by 
which real boxes can be distinguished from image boxes. As will be discussed in Section (3.1.2), knowledge 
of the nearest images of the primitive cell are essential in constructing the necessary interaction lists, which 
this extension of the conventional tree structure provides. 

3.1.2. Filling Interaction Lists 

The most crucial step in achieving a linear method using tree-based methods is choosing an appropriate 
rule for separating 'near' and 'far' interactions. In non-periodic domains, this demarcation is a straightfor- 
ward task. Two boxes are considered to be 'far' from one another if (i) they are separated by at least one 
box length and (ii) their parents are not 'far' from one another. The first portion of this criterion is naturally 
tied to the distance between boxes, and ensures that spatial variations in the relative potential of the two 
domains are limited. The latter portion, on the other hand, ensures that 'far' interactions are computed in 
0{N) operations. 

One of the primary challenges in adapting ACE to periodic domains is constructing a rule that meets both 
of these needs: minimal spatial variations in the relative potential as well as 0{N) scaling. This is some- 
what non-trivial, in that the periodic boundary conditions map the problem onto a domain with a toroidal 
topology. Consequently, the distance between two boxes is no longer unique, as each box will possess images 
that effectively contribute to the relative potential through the periodic Green's function. In determining 
whether or not two boxes in the primitive cell are 'far' from one another, we must then construct a rule 
which gives consideration to all image boxes. Such a rule is as follows: 

(i) The original boxes and all of their images are separated by at least one box length. 
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(ii) Among the parents of both the original and image boxes, at least one pair is not 'far'. 

Fig. (2) illustrates the 'near' and 'far' boxes for an exemplary leaf box in a three and four level tree. The 
contents of boxes outlined in green are stored explicitly in the tree, whereas boxes outlined in red are part 
of the fictitious extra levels discussed in the previous Section. For both the three and four level trees, it is 
worth noting the manner in which the nearfield wraps around the unit cell. The source box residing in the 
upper left corner of the primitive cell will participate in nearfield interactions with observer boxes residing 
at each of the other 3 corners, in spite of their apparent distance. We have observed poorer error convergence 
when using the usual, non-periodic rules for parsing interactions, so this is an important subtlety to keep in 
mind when adapting extant codes to periodic problems. 

Mechanically, the construction of the interaction lists is straightforward. For a given level, we know the 
range of addresses which lie inside the primitive cell, so we can explicitly iterate over only these boxes. This 
iteration begins at the {Ni — l)th level, just below the 'root' of the real tree, at which all boxes are 'near' 
each other. At the next level down, we iterate over the children of these boxes, and apply the 'near' versus 
'far' rule outlined above. As we have addressed the nearest image boxes using Morton ordering, we can 
conveniently access not only the relative location of an image box given its corresponding real box, but its 
entire 'genealogy'. We continue to descend the tree, iterating over only real boxes at each level, until we have 
iterated over the leaf level, at which point the interaction lists are complete. These lists are next employed 
in precomputation, in which they are used to construct a list of the necessary nearfield matrix elements and 
unique ACE translation operators that need be computed. 

3.1.3. Precomputation 

Precomputation can be broken into two stages, (i) nearfield matrix elements and (ii) ACE translation 
operators. In constructing nearfield matrix elements, we iterate over all unique source-observer pairs and 
compute elements of G"'^'"' directly. Precomputation of ACE translation operators is slightly more complex. 
All farfield interactions are sorted by the distance separating source and observer parent domains in each of 
the Cartesian coordinates. Given the regularity that our decomposition of the domain imposes (i.e. all boxes 
at a given level have the same dimensions), and that the ACE translation operators only depend upon the 
relative separation of two boxes, and not their absolute position, there is significant degeneracy among the 
ACE translation operators. Consequently, in sorting by domain separation, we can identify a subset of unique 
translation operators which can be computed once, stored, and recycled. This is particularly important in 
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periodic ACE, as the calculation of a single unique translation operator will possess -^^ — ^-^ unique 

components, each of which is an Ewald-like infinite sum (see Eqn. 20), and thus non-trivial. 



3.1.4. Tree Traversal 

The evaluation of £"^'-^^(Q^), in Eqn. (15), via tree traversal proceeds along the following five steps, 
common to any fast multipole-type method: 

Algorithm 2 ACE Tree Traversal 
1: Charge-to-Multipole (C2M): Construct multipole expansions from point sources in each leaf box. 
2: Multipole-to-Multipole (M2M): Aggregate multipole expansions at higher levels of the tree. 
3; Multipole-to-Local (M2L): Translate multipole expansions about source domains to local expansions 

about observer domains. 
4: Local-to-Local (L2L): Disaggregate local expansions at lower levels of the tree. 
5: Local-to-Observer (L20): Compute observer field from local expansions in each leaf box. 

In what follows, we provide some of the Theorems that describe Algorithm 2 in a mathematically rigor- 
ous context. First, however, as the ACE algorithm was developed in the language of Cartesian tensors, 
a brief overview of the necessary tensor notations is provided in exposition. We denote a Cartesian ten- 
sor of rank n by A*^"-*. In general, such a tensor consists of 3" components from C, indexed by the set 
{ai I i E {1, . . . , n}, a^ G {1, 2, 3}}, where an individual component is given as ylj^"' a„ • ^ totally symmetric 
tensor is one in which ^q" ..q„ is independent of any permutation on the indices, and can be represented by 
(n + 1) (n + 2)/2 independent components. Consequently, we can index components of the totally symmetric 
tensor, A*^"-*, as A [ni, n2, n^], where rii is the number of times that index i occurs and rii + n2 + n^ = n. An 
n-fold contraction between two tensors is denoted by -n-, such that C*^™^"' = A^™) • n ■ B^"). Finally, we 
denote an n-fold product of a rank 1 tensor, r, with itself by r'"-*. More details concerning Cartesian tensors 
can be found in References [17] and [44]. 

In what follows, we prescribe the computation of potentials that are observed in a domain Qq C f^^, due 
to sources in a domain fls C fi^, where the two domains are well-separated in the sense that they are in 
each others' farficld. This framework implies that potentials in flo due to other source clusters can be found 
using the same framework. Superordinate to these domains arc their respective parent domains, O^ C Q^ 
and ilo C il^. The centroids of these domains are located at r^, rf, r,,, and r^. Using these notations. Steps 
1-5 of Algorithm 2 are effected using the following five Theorems. 
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We begin with a Theorem which provides a functional definition of a Cartesian muhipole expansion at 
the leaf level. 

Theorem 1 (Charge-to-Multipole Expansion (C2M)) The potential, ip{ra), at any point r^ G fto, 
due to S sources at points r^ G fig with strength qi (i G {1, 2, . . . , S*}) can be expressed in terms of a 
Cartesian multipole expansion. 

S oo 

V-W = Y. ^^G^d^a - r^il) = E ^'"' • " • V("^G^(ra - r,), (16a) 

,3=1 n=0 

S 

M(") - ^(_l)"^(r^ _ r,)(") (16b) 

/3=1 "■ 

We next consider the manner in which the origin of a multipole expansion can be shifted, in such a way 
that aggregate multipole expansions can be constructed at higher levels of the tree, based upon extant 
expansions at lower levels of the tree. 

Theorem 2 (Multipole-to-Multipole Expansion (M2M)) A multipole expansion of S sources about 
Vs, 0("\ can be expressed in terms of a multipole expansion about the point rf. 

S n , 

M(") = ^(-l)"|(r^-rP)(")^^ E ^(r?-r.)("-")0M (17a) 

/3=1 ■ m=OP(m,n) 



S 

n! 



0(")=E(-l)"^(r;,-r,)(") (17b) 

13=1 



Here, it is important to note that Eqn. (17a) is mathematically equivalent to Eqn. (16b). In other words, we 
do not incur additional error in shifting the origin of our multipole expansion, using (17a) instead of (16b). 
It is this detail of the ACE algorithm which leads to an error which is completely independent of the height 
of the tree, a feature which is demonstrated to machine precision in [17,19,20]. 

At some level in the tree, multipole expansions in the source domain are translated into local expansions in 
the observer domain. The following Theorem expresses this process mathematically in terms of a translation 
operator. 

Theorem 3 (Multipole-to-Local Translation (M2L)) For a multipole expansion about rf, M("\ a 
local expansion L^"-* that produces the same field in i7J^ is given by: 
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V'(r„) = ^(r„-rP)(").n.L(") (18a) 

oo ^ 

L(") = ^ _m(™-") • (m - n) • V("'G^(|rP - rP|) (18b) 



n! 

m—n 



Here, the translation operator is the set of all tensors, V'''"^G^(|rP — rf |), where m G N. The elements of 
these tensors correspond to coefficients of a Taylor series expansion of the Green's function about jrg — rfj. 
For a rank p tensor component of the translation operator, we write its individual components as: 

V(P)G^(K-rf|)b„p„p,]=a^^a^-9P^G^(K-rf|) : p^+Py+p..=p (19) 

As we are utilizing the Ewald representation for G^(|r^ — rf |), we may apply the necessary partial derivatives 
to each term of Eqn. (8), yielding: 

V(P)G^(|rP-rP|)[p„p„p,]= Yl dP-dPydP^£r{\rP-rP + t{n,)\)+ ^ ^^^^^^^^^^(rP-rf ,k(n^)) (20) 

t(n^) k(n^) 

The elements of the translation operator arising from the real sum are given as: 

rri — /J.— 

47r3/2 |R,| Z^Z^ "* ^! |R|p+™ ^ ; ^ ; 

m — /i^O 

= 1^57^2.^'" ^TTT^ (C°"l°-b) (2ic) 

m — 

Here, r(ri,x) is the nth incomplete Gamma function and the coefficients. Cm' "' ^ G M. A full derivation of 
this expression is provided in Appendix C. Expressions for the terms arising due to the reciprocal sum arise 
from straightforward partial differentiation of Eqn. (10), and are given as: 



A-kAi ^ — ' /i! 



Q2(ni) 

'X 



^ a.. (2M-2m)!(2m)! 2m-2^-p,^2^-p. 

/-^ ^m^ {2fi. - 2m - pyy.{2m - pz)\ ^ \t- / \ / 

m—O 



2fi — 2m — py >0 
2m— p2>0 



,ik(n2)-R.i ' ^^ 



± \m — 1 

Ct(i;io)QQ //^ \\ 

e J^? " """ +(a(n2))''-e±"("2)fl.er/c(^^i^±»7ijj I (m = 2) (22b) 



pik(n3).R,-Q^(n3)/4,7^ 

Aia^(n3) 



(jfc,)p-«'')(ifcf-) -r— j7:r^ (a' = 3) (22c) 
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Here, Ri is the projection of R along the i-axis, En{x) is the nth exponential integral and Hn{x) is the nth 
Hermite polynomial. For /i = 1, we have assumed that the lattice lies along the x-axis, and for ji — 2, we 
have assumed that the lattice lies in xy-plane. 



As with multipole expansions, we can shift the origin of the local expansion in such a way that we can 
disaggregate local expansions centered about observer boxes at higher levels of the tree into expansions 
about observer boxes at lower levels. 

Theorem 4 (Local-to-Local Expansion (L2L)) A local expansion O*^") centered about rg can be ex- 
pressed in terms of a local expansion about the point Vg, using: 



CO , s 

L(") = y ( "" 1 0(") -(rn-n)- (r^ - rP)^"-") 
^-^ \m — nj 



(23) 



Finally, we can compute the potential at a point Tq, in flo from the local expansion centered about r,,. 
Theorem 5 (Local-to-Observer (L20)) The potential, V'lra) can be expressed in terms of\J^^^\ a local 
expansion centered about Yq, using: 



^(r„) = f]L(").n.(r„-ro)(" 



(24) 



n=0 



3.2. Theoretical Error Bounds 



One of the primary advantages of fast multipole- type methods is the possibility of deriving mathematically 

rigorous bounds. As discussed in [17], there are two sources of error in the ACE algorithm: (i) Sm due to 

truncation of the Taylor expansion of multipoles at the level at which translation occurs and (ii) e; due 

to truncation of the local expansion. Both errors can be bounded for the potentials under consideration in 

this paper. We begin by considering the error, Sm, where the multipole expansion is truncated beyond P 

harmonics. 

p 



V'(r) - Y^ M(") • n • V(")G^(|r - r^]) 



Y^ M(")-n-V(")G^(|r-rP|) 



n=P+l 



J2 M(").n.V(")|^[£.(|r-r^ + t(n^)|)+£,(|r-r^|,k(n^))] 

n=P+l 



(25a) 
(25b) 



Invoking the absolute convergence of the Ewald representation, we can swap summation and differentiation, 
and then apply the triangle inequality: 
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J2 M(").n. |^[v(")£,.(|r-rP+tK)|)+V(")£fc(|r-rP|,kK)) 

n=P+l 



em<Yl 



J2 M(").n.V(")£,(t(n^),...) 



n=P+l 



E 



J2 M(").n.V(")£fe(k(n^),...) 



n=P+l 



(26a) 
(26b) 



Here, we have separated Em into separate contributions from the real and reciprocal sums. 
Using similar arguments, we can arrive at an analogous expression for sf. 

E 

n^ n=P+l 



(27) 



^ (r - ro)(") • n • ^ M^"-"' • (m - n) • V(")£^(t(n^), . . .) 

n=P+l 



7n^n 

oo 



^ (r - ro)(") • n • ^ M^"-"' • (m - n) • V(™)£fc(k(n^ 

n=P+l )n=n 



(28a) 
(28b) 



In both cases, it is evident that the total error is bounded by the sum of errors incurred in reconstructing 
each term of the respective sums as a Taylor expansion. 

Given this form for the bound on the total error, we can derive bounds on £,„ ,,(np) and e; ,,(11^) based 
purely upon the potential being evaluated, as the codimensional dependency is implicit. Similarly, em,fc(n^) 
and £i.fc(np) can be bounded purely based upon the codimension under consideration, as the dependency on 
the type of potential is implicit. We present exemplary expressions for bounds on terms in the Coulombic 
real sum for arbitrary codimension, and for terms in the fi ~ 3 reciprocal sum for an arbitrary potential. 
The bound on Coulombic terms is given as: 



£m,r{^fj.) ^ ty 



,^+1 



47r3/2(P+l)!|R| 



(^+1) 



T{P + 3/2, (1 - a)r/2|R|2) r(P + 5/2, (1 - a)r/2|R|2) 



(29) 



(1 - a)^+3/2 (1 - a)^+5/2 

Here, a — \ri,max\/\^\ < 1, where |R| = |r — r^ + t(n^)|, |ri^ma^| is the distance from the source box center 
to its furthest point source, and C G M^. A detailed derivation of this bound is provided in Appendix D. 
A similar procedure can be used to derive bounds on the Helmholtz and Yukawa potentials. The bound on 
the /Li = 3 reciprocal terms is given as: 



em,*;(n^) < C 



(k(n3) • ri 



^P+l 



(P + 1)! 



,-a"(n3)/V 



Asa'^ins) 



(30) 



While the proof is straightforward, it is also furnished in Appendix D. In both cases, the bound on the terms 
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contributing to si can be derived following a procedure similar to that outlined in Appendix D, as well. 

For physically relevant parameters, we find that the dominant contributions to the error arc due to terms 
in the sum on the interval < |n^| < 2, as one may intuitively expect on the basis of the rapid convergence 
of the Ewald sum. Further, we can see that as P — > oo, the error in each term can be made arbitrarily small. 

While bounds can be derived for the other potentials and codimensions under consideration, they are 
considerably looser, and thus ommited. One of the primary reasons for this looseness is the anisotropy with 
which we expect our expansions to exhibit in their convergence. This expectation is based upon two sources 
of anistropy, the discrete rotational invariance of the lattice, and variations in the behavior of the Green's 
function in and out of the lattice. The former source will be prevalent in all of the potentials considered 
in this work, whereas the latter is limited to situations in which the codimcnsion of the lattice is non-zero. 
This stands in contrast to the problems to which ACE has previously been applied, wherein the kernels of 
the associated potentials are spherically symmetric, and error bounds are isotropic. In Section 4, we present 
numerical error convergence data to provide a more practical perspective on the accuracy of our expansions. 

3.3. Computational Complexity 

The computational cost of the ACE algorithm has been previously analyzed in detail for non-periodic 
problems [17]. Here, we provide a brief review of the dominant costs and highlight minor differences which 
arise in adapting ACE to periodic systems. In doing so, we consider a primitive cell in which N co-located 
source/observer points are randomly distributed. These points are mapped onto an Ni level tree, where the 
number of boxes at level I is Bi and -Bj_i = 8Bi. The average number of unknowns per leaf box is denoted s, 
i.e. N/s = Bi, and the total number of boxes at all levels X^jJi Bi ^ — ^ Bi. The total cost of evaluating 
Eqn. (15), truncating all expansions at Pth order, can be broken down into nearfield (NF), C2M, M2M, 
M2L, L2L, and L20 costs, each of which is summarized below: 

(i) NF: s^ operations per 'nearfield' neighbor X 27 'nearfield' neighbors per leaf box X Bi leaf boxes: C'nf = 27Ns 
(ii) C2M: s operations per component X ~ ^r- unique multipolc components X Bi leaf boxes: Cc2M = ~rP^ 
(iii) M2M: -^ ^ operations per M2M translation X ~ _Bi M2M translations: Cm2M = —Yoa 



(iv) M2L: ^ =^ operations per 'farfield' neighbor X ~ 56 'farfield' neighbors per box X ~ _Bi boxes: Cm2L =56- 
(v) L2L: Same as M2M: Cl2L = t^ 
(vi) L20: Same as C2M: Cl20 = ^P^ 
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This leaves us with the total cost: 

Ctot = N [27s + — + 58^^ j ^ 0{N) (31) 

The multiplicative factor behind N is minimized if we choose a density of s « ^j^ . As discussed in Section 
(3.1.1), rather than specifying a minimum box size to achieve this ideal density of unknowns prior to 
constructing the tree, we specify the number of levels, Ni. Knowing that Bi = 8^'^^, this ideal density of 
unknowns is then realized for A*"; = [l + logg(ip^)] . 

Excepting an improvement that we have made in our algorithm since the publication of [17], this is es- 
sentially the same cost structure. In our original publication, the number of 'farfield' neighbors per box is 
given as 189, here we have reduced it to 56. This is achieved by taking advantage of the exact up/down tree 
traversal operations in ACE. In situations in which a box is in the 'farfield' of all of the children of a given 
parent, the M2L translation will occur at the level of the parent. For a given box, there will be 27 — 8 = 19 
such parent level interactions, and 64 — 27 = 37 child level interactions, leading to 56 such interactions in 
aggregate. This change in the way 'farfield' interactions are treated is independent of whether ACE is being 
used for periodic or non-periodic problems. 

The only significant difference in cost between periodic and non-periodic ACE is in the prefactors in 
the C2M and M2L costs. The 27 'nearfield' neighbors per leaf box and 56 'farfield' neighbors per box are 
upper bounds, and in the case of non-periodic ACE, only realized for boxes that have no faces touching the 
boundary of the computational domain. For boxes on the boundary, however, both the number of near and 
far interactions will be reduced, the net effect of which is a practically negligible change in the optimal value 
for s and the overall prefactor. In periodic ACE, however, boundary boxes essentially participate in near 
and far interactions as if they were on the interior of the computational domain, given our revised rule for 
filling interaction lists. In this sense, periodic ACE is actually closer to the idealized cost given above. 

4. Results 

In this Section, we present the results of numerical experiments carried out using the ACE algorithm. 
These results are intended to validate the utility of our method as a fast and accurate means of evaluating 
the periodic potentials discussed in this paper. Error convergence tests were performed on a desktop com- 
puter with a dual core Intel Pentium D clocked at 3.20GHz with 3GB RAM, running Linux OS. Scaling 
tests were performed using a single node at the Michigan State University High Performance Computing 
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Center (HPCC). Each node is equipped with two quad core Intel Xcons clocked at 2.93GHz with access to 
47GB RAM. All code was compiled using the Intel Fortran Compiler. We have not exploited any degree of 
parallelism in generating these results; HPC resources were employed to raise the ceiling on the number of 
unknowns for our scaling tests. 

4.1. Error Convergence 

In demonstrating error convergence, we consider only the contributions to the total potential arising 
due to farfield interactions. ^;^'^^ — £"^'-^^(Q^) is computed using the ACE algorithm whereas \I'*'''^'=* = 
Gq/3Q^ — G2.%°'^Qi3 is computed directly. We do not explicitly compute this difference, but simply ignore 
nearfield pairs in evaluating the potential. The relative error in the L2-norm is reported in all numerical 
experiments: 



\-qjJ^CE _ \^direct\2 



^ / I a a l_ ^QO^ 



As farfield contributions will tend to be slightly smaller than those due to nearfield interactions, this is in 
some sense a 'worst-case' metric for error, and one can typically expect an order of magnitude improvement 
in the error for the entire potential. 

We first demonstrate that we can achieve arbitrary precision using the ACE algorithm by increasing the 
order above which our expansions are truncated, P. Results are presented for each of the 9 permutations of 
potentials/codimensions; all computations are done for a random distribution of co-located source/observer 
points. In all tests, the locations of 1000 sources were chosen from a random distribution on a line (/i = 1), 
plane (/i = 2), or cube (/i = 3). The magnitude of the source was chosen at random from a uniform distribu- 
tion on the interval [0, 1). All Ewald sums were evaluated to a relative error of egf = 10^^. The error, Sfar 
is calculated under these conditions as P is varied from 1 to 11, and the results for /i = 1,2,3 arc given in 
Tables 1, 2, and 3, respectively, for all 3 potentials. From these results, it is evident that the error in approx- 
imating all 9 potentials with ACE expansions decreases uniformly as P increases. In particular, we find that 
in all cases e ranges from ^ 10^^ for P = 1, to between ^ 10^'"' — 10^^ for P = 11, in all experiments. As is 
evident from these tables, the error decreases uniformly with increase in P; a more detailed P dependence 
can be gleaned from these Tables. In Table 4, we demonstrate convergence in P for the Helmholtz potential 
for /i = 1 and fi = 2 lattices in which sources are distributed outside of the dimension of periodicity, i.e. over 
a cube rather than a plane or a line. In both cases, we find that the error convergence exhibits behavior sim- 
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ilar to that found in Tables 1 and 2. One point of clarification is necessary in presenting convergence results 
for truncated infinite sums. In some cases, the relative error in the potential apparently exceeds the relative 
error at which the sums are truncated {egf = 10^"''). This is not anomalous in as far as the errors presented 
give an indication of how well the ACE expansions converge to the necessarily finite order approximation to 
the actual potential. Next, we present results for the Helmholtz kernel for ji = 2 subject to quasi-periodic 
boundary conditions, the details of which can be found in Appendix E. In Table 5, wc find that at a fixed P, 
farfield error is largely independent of the quasi-periodic phase angle, in as far as error is observed to be of the 
same order of magnitude for all parameters. This robustness opens our method to a number of applications in 
electromagnetics/optics, including oblique scattering from metamaterial structures or photonic crystal slabs. 

In demonstrating the accuracy and applicability of the ACE algorithm to a wide range of practical prob- 
lems, it is interesting to analyze convergence of the ACE expansions at a fixed value of P as k varies. Recall 
that K is inversely proportional to the screening length for the Yukawa potential, and the wavelength for the 
Helmholtz potential, motivating the introduction of a unitless length scale, A = 27r/(K(^^)^/^)). This scale 
roughly defines how rapidly the potential will vary over a unit cell, viz. the number of wavelengths in a unit 
cell for the Helmholtz potential. That the ACE algorithm maintains a high degree of accuracy at a fixed value 
of P for a broad range of relevant A values for both Helmholtz and Yukawa potentials is made evident in Fig. 
(3). Here we have distributed 1000 unknowns over a cube of unit volume and evaluated Sfar, at a fixed value 
of P = 6, for both the Helmholtz and Yukawa potentials with /i = 2 for A ranging from 0.5 to 1024. The 
error, £fan decreases uniformly for A > 0.5 from efar ^ 10^^ — 10^^ at A = 0.5 to efar ^ 10^^ at A = 1024. 
This is essentially due to the fact that as A increases, slowly varying terms become increasingly dominant 
in the Green's function. That the error convergence appears to outperform the Coulombic (A — > oo) result 
in Table 2 is due to the fact that we have extracted the n^ = term from all Coulombic calculations, as 
discussed towards the end of Appendix A. To examine the degradation in £far as A decreases, at fixed P, 
it is useful to consider previous results and analysis for the non-periodic Helmholtz kernel, as presented in 
[20]. Here, the authors allude to a decrease in the efficiency of the ACE algorithm at a fixed accuracy, i.e. 
as A becomes small relative to a box length, P must be increased to achieve the same relative error. This 
type of behavior is similarly evident in the periodic Helmholtz and Yukawa potentials. In Table 6, £far is 
calculated for a random distribution of 1000 points distributed over a planar unit cell [A^j. = 1) for both 
periodic and non-periodic [20] Helmholtz potentials at a fixed value of P = 8. This test is simply intended 
to demonstrate that the periodic and non-periodic ACE expansions degrade at approximately the same rate 
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as wavelength is decreased at fixed P. While increasing P somewhat alleviates this, as A decreases, P will 
become prohibitively large at some point in maintaining a desired level of accuracy, limiting the viability of 
ACE. 

To this end, it is worthwhile to examine this apparent limitation in the context of the physical relevance 
of small values of A. Consider A = 0.5 (k — Air), the value for which efar is highest in Fig. (3). For 
the Yukawa potential, A = 0.5 corresponds to a strongly screened system, in as far as second nearest 
neighbor cells will introduce a correction to the potential less than 0.001% of the correction due to the 
nearest neighbor cells. In such a strongly screened system, the interactions are better handled by short- 
range acceleration techniques like linked-cell methods [14]. Physically relevant values of A will instead be 
on the order of or greater than a primitive cell length {Ap, ), i.e. a weakly screened system for which the 
ACE algorithm is demonstrably accurate. For the Helmholtz potential, rather than strong screening, small 
values of A correspond to increasingly oscillatory behavior. Fortunately, most physically relevant conditions 
for the periodic Helmholtz potential involve values of A which correspond to systems in which the period of 
oscillation is on the order of, or greater than a primitive cell length. Typically frequency selective structures, 
metamaterials, and photonic crystals, to name but a few applications, consist of subwavclcngth unit cells 
(i.e. A > 1 on our scale), to which the ACE algorithm is also very well-suited. 

4.2. Scaling 

As with error convergence, in demonstrating linear scaling we provide timings for the calculation of 
the farfield contributions to the total potential, and explicitly demonstrate that the remaining number of 
nearfield interactions will scale as 0{N). Tables 7, 8, and 9 contain a number of metrics which are indicative 
of the speed-up achieved in using the ACE algorithm. These metrics are as follows: 

(i) tACE- Time required for tree traversal, i.e. the execution of steps outlined in Algorithm 2 
(ii) tpre- Time required for the precomputation of ACE translation operators 

(iii) N unique'- The number of unique translation operators that must be evaluated in precomputation 
(iv) tfar'- Time required for the direct evaluation of farfield interactions (i.e. computing and storing non-zero 

elements of G^^' = G„^ - G^^/^ 
(v) t direct'- Time required for the direct calculation of the farfield contribution to the potential (i.e. evalu- 
ation of \l>*''ect as a matrix-vector product) 
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(vi) Nnear- Thc remaining number of nearfield interactions (i.e. non-zero elements of GJ^o"'') 

In all 3 tables, potentials are evaluated for P = 7, yielding a farfield error 0(10^^). Timings with prepended 
tildes were extrapolated based upon multiplication by a factor determined by thc increase in the number of 
farfield interactions. In all problems, the primitive cells are of unit dimension, and for the Helmholtz and 
Yukawa potentials, physically realistic values of k {2tt and 27r/10, respectively) were chosen. All Ewald sums 
were evaluated to a relative error of 10"^, in both ACE and direct calculations. In all tests, the average 
number of unknowns per leaf box is fixed at 64. 

In all cases, a considerable speedup in evaluating the farfield contribution to the potential is achieved, 
ranging from a factor of 12 for 1024 unknowns with fi — 1 to 6 million for ^ 1 million unknowns with fi = 2. 
In all cases, the breakeven point is found to be rather low; 540 unknowns for /i = 1, 570 unknowns for /i = 2, 
and 1730 unknowns for /i = 3. It is evident in all 3 tables that 0{N) scaling is achieved in tree traversal. A 
linear regression on a log-log scale yields a scaling exponent which differs from 1 by less than 3% in all cases 
under consideration. Similarly, linear scaling is apparent for Nnear , with a scaling exponent that is also very 
nearly unity; the specific exponent is given in the caption to the Tables. Finally, the precomputation time, 
tpre exhibits clear sublinear scaling in A^ for all cases. The cost of the precomputation is typically ignored in 
the literature, usually being dismissed as a one time cost, negligible relative to the repeated calculation of 
potentials from the same tree. While this is often the case for non-periodic potentials, as the evaluation of 
the requisite periodic translation operators is seemingly non-trivial (i.e. many infinite sums), it is important 
to demonstrate that this step can be completed on a time scale that is not prohibitively long, relative to a 
single tree traversal. One final point worth noting is that we have intentionally chosen a non-optimal value 
for the density of unknowns per leaf box at P = 7. Cost is optimized for a density of ^ 20 unknowns per box, 
whereas our tests were run at 64 unknowns per box. As there is less control over thc density of points per 
leaf box, due to thc manner in which the tree is constructed (i.e. number of levels instead of leaf box size), it 
is important to demonstrate that both scaling and a low breakeven point are maintained for densities away 
from the optimum. At more optimal densities, the scaling exponent will be closer to 1 and the breakeven 
points will decrease, as well. 
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5. Conclusion 

In this paper, we have presented extensions of the ACE algorithm which realize the 0{N) calculation 
of Hclmholtz, Yukawa, and Coulomb potentials on singly, doubly, and triply periodic lattices. The results 
presented demonstrate error convergence as well as considerable acceleration. Further work is being sub- 
mitted elsewhere to demonstrate the applicability of our method to the analysis of electromagnetic wave 
propagation [31]. We are presently working on adapting these techniques to solid-state electronic structure 
calculations involving defects. 
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Appendix A. Representations of the Periodic Green's Function 

In this Appendix, we discuss different representations of G^dr — r'|), in particular the direct, spectral, and 
Ewald forms. Perhaps the simplest of these is the direct representation, in which G^(|r — r'|) is furnished 
by a sum of the relevant free space Green's function, G(|r — r'|) over £^: 

G^(|r-r'|)= J2 G{\r-r' + t{n,)\) (A.l) 

t(n^)e£^ 

For the PDEs specified in (1), the proper free space Green's functions are: 

G(|r - r'l) == — ■ (Hclmhohz Equation) (A. 2a) 
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G(|r - r'l) == — ■ (Yukawa Equation) (A.2b) 



G(|r - r'l) = — (Poisson Equation) (A. 2c) 

It is evident that, for the Helmholtz (A.l) is conditionally convergent, and for the Coulomb potential it 
is manifestly divergent (a topic which is discussed in more detail, later). While this is not the case for 
the Yukawa potential, for very small values of k (i.e. l/n >> max{|ai| \i G l,..,fi}), (A.l) may be slowly 
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convergent relative to other representations. Alternatively, we might pursue a spectral representation of 
G^(|r — r'l), in which case we transform the sum over C^, to a summation over its reciprocal lattice, C* . 
This is typically accomplished by way of the Poisson summation formula: 

^ /(r + t(n^)) = ^ E e^'^^""^"-^ (k(n^)) , where F (k(n^)) = / d^r'e-*("-)-"-'/(r') (A.3) 

Typically, this identity is applied to Eqn. (A.l), directly yielding the spectral representation of the peri- 
odic Green's function. Rigorously, however, this is only permissible for the Yukawa potential, as the direct 
representation is a conditionally convergent sum. Uniform convergence of the summation on the left hand 
side of (A.3) is in fact a necessary condition for this identity's validity [45]. This deficiency can be circum- 
vented by inserting a convergence factor, as in [34], which yields effectively the same result as the Poisson 
summation, excepting singular contributions which need be regularized depending upon the application. 



Gi(|r-r'|) = -i^ ^ e^'^("^)--'ifoWni)|r,|) 



^^(i^-^'i)-^ E ^ 



k(ni)e£* 
1 .,_^ ., , , g-a(n2)|rt 



(m = i) 


(A.4a) 


{^, = 2) 


(A.4b) 


(/i = 3) 


(A.4c) 



2-^2,, v.. "("2) 

G3(|r-r'|) = ^ ^ e^Mn3)-r, 1^ 

Here, r; and r^ are the components of r — r' which lie in and out of the lattice, respectively, and the functional 
form of a{n^) is given in Eqn. (11a). For /i = 1 or 2, the sums in (A. 4) exhibit spectral convergence away 
from the lattice, i.e. |rt| > 0. However, for |rt| — > or /i = 3, these sums have poor convergence properties 
which considerably limits their numerical utility. 

To achieve a representation of G^{\v — r'|) which is absolutely and rapidly convergent both near and 
far from the lattice, we turn to Ewald's method [32]. In the Ewald representation, the Green's function is 
separated into two rapidly convergent sums, one on >C^, the other on C*. Our derivation begins with the 
following integral representations of the free space Green's functions: 
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G(|r-r'|).^/ci.e-^l-'l^+^ 









oo 

G(|r-r'|) = ^|d.e-^>-'l 



(Hclmholtz) 


(A.5a) 


(Yukawa) 


(A.5b) 


(Poisson) 


(A.5c) 





One means of arriving at the Ewald representation is to split this integration on [0, oo) into separate integrals 
on [0, rj] and [rj, oo), where rj G K.^. We can then represent the periodic Green's function as: 

V 



G,i\r-r'\ 



^ J ds g is, \r-r' + t{n^)\)+ ^ J ds g{s, \r - v' + t{n^)\) (A.6) 

Here, the form of Q{s, |r — r' + t(n^)|) is evident from Eqn. (A. 5). We exchange summation and integration, 
and then apply the identity in Eqn. (A. 3) to the first integral: 

J2 Jdsg{s,\r-r' + t(n^)\)^ ^ j ds j d^'r" e-'''^^->''" g{sy'\) (A.7) 

Evaluating all integrals, we are left with the Ewald representation of the periodic Green's function: 

G^(|r-r'|)= Y. ^/c(r-r',kK))+ ^ Er{\v - v' + t{n,)\) (A.8) 

The functional forms of £^(1^ — r' + t(n^)|) and £k{v — r', k(n^)) are given in Eqns. (9) and (10). 



While we ignore singular contributions to the potential throughout this paper, we provide a brief discus- 
sion of them here for the sake of completeness. The most evident singular contributions come about due to 
the so-called self-terms (i.e. |r — r'| ^- 0) which we have subtracted out in Eqn. (14b). The exact manner 
in which this is regularized in practice is largely application dependent. In calculating potential energies in 
a Coulombic system, a Laurent expansion of the Green's function is employed, and the term which scales 
as T^—pi is simply negated [34]. In the context of some integral equation discretization schemes, such as the 
Method of Moments, the singularity is regularized by the integration measure associated with the source 
and testing integrals. 

Another type of singular contribution can arise due to the situation in which a{r\fj) — > 0. Again, the 
manner in which these behaviors are regularized are application dependent. In Coulombic systems, this 
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singular contribution vanishes for charge neutral primitive cells, up to a correction proportional to the dipole 
moment of the primitive cell [34]. As one of our applications of interest is in studying electronic defects 
in which the primitive cell is not charge neutral, we have not guaranteed charge neutrality in numerical 
experiments involving Coulombic potentials. Instead, we simply ignore the a(n^) — > term of the Ewald 
representation of the periodic Coulombic Green's function. As this contribution is spatially uniform, it does 
not affect our error convergence. 

Appendix B. Evaluation of Ewald Sums 

In evaluating the periodic Green's function in the Ewald representation, as well as the periodic ACE 
translation operators which have the form of an Ewald sum, a few considerations are necessary to achieve 
optimal results, namely (i) the order in which terms are added, (ii) criteria for truncating the summation, 
and (iii) the choice of an appropriate splitting parameter, 77. The calculation of G^(|r — r'|) begins with 
the contribution due to n^ = 0, and subsequent terms are added on over surfaces of constant |n^| (points 
— > /i = 1, circles — > /i = 2, and spherical shells — > /i = 3) for increasing |n^|. We denote the partial sum over 
the surface for which In,, I — m as: 



G^(|r-r'|)U= J2 f.(|r-r' + tK)|)+ ^ ffe(r - r',kK)) 



(B.l) 



Such that: 



t(nM) 



G^(|r-r' 



|n^|=m 



Eg.( 



(B.2) 



Our criterion for the convergence of this sum is given in terms of the relative convergence of the Mth partial 
sum in the L2 norm: 
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E 
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E G4|r-r' 



< eoF 



(B.3) 



In all of the results presented in this paper, Egf = 10 ^ for both the direct evaluation of the Ewald sum, as 
well as the ACE translation operators. 



The only remaining consideration is 77, which controls the relative rate of convergence of the real and 
reciprocal sums. As 77 increases, the contribution of the reciprocal sum to the overall convergence of the 
Ewald sum is increased. In the literature, an optimal value of 77 is considered to be the one for which the 
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(m = i) 


(B.4a) 


(m = 2) 


(B.4b) 


(/i = 3) 


(B.4c) 




(B.4d) 



reciprocal and real sums have the same asymptotic rate of convergence [36,37,46]. This optimal value of rj 
has different forms for different values of ^. 

Vopt = v/7r|ai|"^ 

|ai|2 + |a2|2 + |a3|2 



In practice, we have found that using the same r/ for the Green's function itself and the ACE translation 
operators delivers ideal performance in terms of both speed and accuracy. Unless otherwise indicated, we 
employ this optimal value for ry in all numerical experiments. There are conditions under which alternative 
values of r] must be used for the Helmholtz potential. It is well-established in the literature that using r]opt 
for the periodic Helmholtz potential will lead to a 'catastrophic loss' in accuracy, due to finite precision 
arithmetic, for situations in which the unit cell is on the order of, or larger than a wavelength. Methods to 
mitigate this loss in accuracy at the expense of sub-optimal convergence have been proposed [36,38,47], and 
are utilized when appropriate. 



Appendix C. Derivation of Real Sum ACE Translation Operators 

In this Appendix, we derive the expressions given in Eqn. (21) for the translation operator components 
arising due to the real sum in the Ewald representation of G^(|r — r'j): 

V(^)£.(|r? - r? + tK)|) [p.,py,p.] = dP-dPydP^EriK - r? + tK)|) (C.l) 

While straightforward partial differentiation of the expression given in Eqn. (9) will yield a viable expression, 
we find that a more computationally efhcient expression can be arrived at by manipulating the integral 
representation given in Eqn. (A. 6). For the periodic Helmholtz potential, wc proceeed as follows: 
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oc 

V(P)£.(R) [p.,Py,p.] = ^Jds dP^dP^ydP'e-'''\^\'+^ (C.2a) 

V 

oo 

= ^y^s {-s)PHp,{sR,)Hpy{sRy)Hp,{sR,)e-'"\'^^"+f^ (C.2b) 

V 

oo p 

-''~^^^- fds ^ C'^-P«'P-s''+'"e""'l^l'+^ (C.2c) 



27r3/2 , 

Here, Cm' "' ^ is the coefficient of the term which is ?Tith order in s in the product of Hcrmite polynomials 
in Eqn. (C.2b). To evaluate this integral in closed form, we expand the exp{K'^ / 4s'^) term in a Laurent series 
in s. 

m— fi—0 



V('')£.(|R|) [p.,py,p.] ^L^JdsJ2 C'm-'-'^ E Mf)!::.P+™e-^l^l^ (C.3a) 



»; 



= Li)! J2 E ^™ '""'"^ / '^^ ^^L^^P+^e-^'l^l' (C.3b) 

_ (-1)^ >^ ^ p....„... (^^|R|V4)^ r (H±^ - ^,,2|R|2) 

Here, T{n,x) is the incomplete Gamma function of nth order. These steps can be repeated for the Yukawa 
and Coulomb potentials, yielding the following expressions: 

I I m— /J,— ' ' 

(C.4a) 

- (-1)^ V V C^^-r>^.-r> A-'\^\y^rn'-^r^-^^,v'\^n .Yukawa) 

- 4^3/2 |R|Z.Z.^™ , |R|p+™ (Yukawa) 

(C.4b) 

/ np P J. ( p+m+l „2|-D|2\ 

= Jzl^ y C^..p..p. ^ ^ 2 ^^ 1^1 ) (Coulomb) 

47r3/2 R ^^ " RP+™ ^ ^ 

' ' m— 

(C.4c) 

All sums over incomplete Gamma functions are rapidly convergent for physically relevant values of the 
arguments, and recurrence relations are utilized to rapidly compute each term. 
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Appendix D. Derivation of Error Bounds on ACE Expansions 



D.l. Bounds on Terms in the Real Sum for the Coulomb Potential 



Using Eqns. (21c) and (26b), the error incurred in approximating the n^th term of the real sum by 
truncating the M2L expansion above Pth order is given as: 



^m.r\'^^) — 



E M 



(«) . r,. . 



-P+1 



(-1)" V r(")£lllt=i^W) 
Rl ^„ " 



47r3/2 



m=0 



|R|"+'^ 



(D.l) 
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Here, Cm is a tensor whose components consist of the products of Hermite polynomials given in Eqn. 



(C.2b). For an arbitrary nth rank tensor A^"^ the following inequality holds: 



A(") • n • Cl: 



<Ch 



A(") • n ■ 



\R\ 



As r(n,x) increases monotonically in n, the following simple inequality will hold: 

J2 K(«)£ll3^^^W) < ^j,(„) ^ r(n+ i,,2|RP) 



m— 



|R|2« 



|R|2" 



Combining the previous two inequalities, we can manipulate Eqn. (D.l) into the following form: 



em,r(n^) < C 



£ M^") n j,(n) (-l)"»r(n + l,,2|R|: 



n=P+l 



47r3/2|R|2"+i 



(D.2) 



(D.3) 



(D.4) 



Next, we recall the following inequality for an arbitrary nth rank tensor, A'"^ contracted with a Multipole 
expansion in which the furthest point source from the origin is at position Vi^max [17]: 



|A(") . n • M(")| < C^|A(") • n • rfj,, 



(D.5) 



Applying this inequality to Eqn. (D.4): 



em,r(n^) < C 



°° rl"l. ^ ^(„,(-l)"nr(n+l,r;2|R|2) 



E 



n 



47r3/2|R|2"+i 



(D.6) 



n=P+l 

Using the integral representation of the incomplete Gamma function and exchanging summation and inte- 
gration, we are left with the following: 



em,r(n^) < C 



dt j: 



oo (n) 

i.vaax 



^2|R|2 



n=P+l 



(„) (-l)»nt"-V2^-t 
n! 47r3/2|R|2"+i 



(D.7) 



Applying what is essentially the Cauchy-Schwartz inequality to vf^^^^ ■ n ■ R'"% as in [17]: 
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em,r(n^) < C 
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Resolving the infinite sum inside of the integrand: 

\r-. 1^+1 



£m,r(n^) < C 



47r3/2(P+l)!|R|^+2 



dt t^+i/2 p + 1 



IRI 



r,2|R|2 



t e 



(D.9) 



We adopt the notation, a = i^'t'° , and note that a < a/3/4. In the ACE algorithm, |R| > 2dxo, based 
upon the criterion for 'farfield' interactions and |ri.„iaj;| < y3/2dxo, for a box size of dxQ. Evaluating this 
integral, we arrive at the following final expression for our bound: 

^^+1 r(P + 3/2,(l + a)7?2|R|2) r(P + 5/2,(l + a)r72|R|2) 



£m,r(ll/^) S: (-" 



47r3/2(P+l)!|R| 



(^ + 1)- 



(1 + a)P+3/2 (1 + a)^+5/2 

We note that a looser, but monotonically decreasing bound is given by: 
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(1 + a)^+3/2 (1 + a)^+5/2 
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D.2. Bounds on Terms in the Reciprocal Sum for ^ — 3 



Starting from the expression for V^£fe(|R|) given in Eqn. (22c): 
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J2 M(") • n ■ (ik(n3)) 
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Using the inequality in Eqn. (D.5): 



em,fc(n3) < C 



(k(n3) ■ r^.maa:)^^^ 

(P + l)! 
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Appendix E. Quasi-Periodic Boundary Conditions 



Quasi-periodic boundary conditions typically arise when considering Helmholtz-type problems in which 
an array of scatterers is excited at oblique incidence. We characterize this excitation in terms of a plane 
wave of the form exp(i'ko ■ r), where |ko| = k, the effect of which is manifest as a phase factor applied to the 
potential when its argument is translated by a lattice vector. 



^(r + t(n^)) = e'ko-t(„,)^(r) . t(„^) ^ ^^ 



(E.l) 
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This stands in contrast to the standard periodic boundary condition given in Eqn. (6). This modification of 
the boundary conditions simply necessitates the use of a quasi-periodic Green's function and its associated 
translation operator. No further modifications of the presented algorithm are necessary to give consideration 
to this class of potentials. The quasi-periodic Green's function, G^^ko(k ^ i"'l)i can be written in terms of 
the Ewald representation of the periodic Green's function (Eqns. (9) and (10)) as follows: 

G^,ko(|r-r'|)= J2 e^'^°-*("-)£.(|r-r' + tK)|)+ ^ £,(r - r',kK) + ko) (E.2) 

This same transformation can be applied to the expressions for the periodic translation operator (Eqns. (21) 
and (22)) to arrive at expressions for the quasi-periodic translation operator. We note that while we only 
present results for the quasi-periodic Hclmholtz potential, this approach is sufhciently general that it can 
be applied to quasi-periodic Coulomb or Yukawa potentials, should these expressions be relevant for some 
application. 
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Fig. 1. An exemplary periodic domain on a 2-dimensional lattice. The central primitive cell and its nearest image cells are 
illustrated. Spheres are intended to represent p(r) in such a way that its periodicity is evident. 
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Fig. 2. The 'near' (green) and 'far' (light green) interactions for source boxes (dark green) at different levels of the tree. Sources 
and observers are co-located, with green points corresponding to sources defined in Q^ (i.e. the primitive cell), and red points 
corresponding to their periodic images. Red points/boxes are not actually stored, but need be considered in constructing 
interaction lists, as discussed in Section 3.1.2. 
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Fig. 3. Error convergence as a function of A for 1000 points randomly distributed over a cube of unit volume for fi = 2 a.t a 
fixed value of P = 6. 

Table 1 

Convergence of farfield error as a function of the order of ACE expansions, P, for 1000 point sources randomly distributed 



1. 


K = 2tt for Helmholtz and Yukawa potentials, and |ai 


= 1. 


p 


Sfar (Helmholtz, 11 — I) 


Sfar (Yukawa, /i = 1) 


Sfar (Coulomb, ^ = 1) 


1 


2.65262170E-01 


3.42134128E-01 


2.01244948E-01 


3 


1.86477440E-02 


7.38115507E-02 


2.22274736E-02 


5 


1.24733083E-03 


1.30746253E-02 


3.30542636E-03 


7 


6.76186697E-05 


2.28421189E-03 


5.58720137E-04 


9 


1.42733438E-05 


4.18448553E-04 


1.01927872E-04 


11 


2.69372066E-06 


8.04177927E-05 


1.95815575E-05 



Table 2 

Convergence of farfield error as a function of the order of ACE expansions, P, for 1000 point sources randomly distributed over 

a plane and fi = 2. k = 2it for Helmholtz and Yukawa potentials, and |ai| = |a2| = 1. 



p 


Sfar (Helmholtz, fi = 2) 


Sfar (Yukawa, yU = 2) 


Sfar (Coulomb, ^ = 2) 


1 


2.51998705E-01 


2.47605092E-01 


1.43762519E-01 


3 


2.26492689E-02 


3.36631422E-02 


6.39293888E-03 


5 


1.05073461E-03 


3.70471581E-03 


6.66751431E-04 


7 


3.47767989E-05 


5.51356326E-04 


1.21687245E-04 


9 


1.43545891E-06 


8.53940482E-05 


1.72698036E-05 


11 


8.11541488E-07 


2.82777859E-05 


7.68416578E-06 
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Table 3 

Convergence of farfield error as a function of the order of ACE expansions, P, for 1000 point sources randomly distributed over 

a cube and /i = 3. k = 2tt for Helmholtz and Yukawa potentials, and |ai| = |a2| = |a3| = 1. 



p 


Sfar (Helmholtz, ^ = 3) 


£far (Yukawa, /i = 3) 


Sfar (Coulomb, fi — S) 


1 


9.23618334E-02 


2.01036772E-01 


9.98503574E-02 


3 


8.62434445E-03 


1.58866117E-02 


1.41604955E-03 


5 


4.24578257E-04 


2.52572255E-03 


2.66501969E-04 


7 


1.71411755E-05 


8.50629227E-04 


6.88847400E-05 


9 


2.83889405E-06 


1.88337597E-04 


2.21046124E-05 


11 


2.33785092E-06 


7.19361796E-05 


7.88897392E-06 



Table 4 

Convergence of farfield error as a function of the order of ACE expansions, P for 1000 point sources randomly distributed over 

a cube of unit volume for fJ. = 1 and fi = 2. Both data sets are for a Helmholtz potential with k = 27r and |ai| = |a2| = 1 



p 


Efar (Helmholtz, yU = 1) 


Efar (Helmholtz, ^ = 2) 


1 


2.24897426E-01 


2.23438106E-01 


3 


2.16007507E-02 


1.99308663E-02 


5 


1.20343605E-03 


9.69744947E-04 


7 


5.26344870E-05 


2.99860610E-05 


9 


1.22053802E-05 


4.86599084E-06 


11 


3.32008243E-06 


2.67447745E-06 



Table 5 

Variation in farfield error as a function of incidence angle for quasi-periodic boundary conditions. For this test, P = 7 and 
the geometry consists of 1000 point sources randomly distributed over a plane with |ai| = |a2| = 1. We are concerned with a 
Helmholtz potential with fi = 2 and k = 2n. 



(deg.) 


efar (0 = 0) 


Efar (0 = 22.5) 


Sfar {<P = 45) 





1.34745750E-05 


1.34745750E-05 


1.34745750E-05 


15 


1.27088420E-05 


1.91878307E-05 


1.76093009E-05 


30 


3.61876368E-05 


5.18579773E-05 


2.92526545E-05 


45 


1.33004484E-05 


3.65870039E-05 


1.28715043E-04 


60 


1.39020493E-05 


1.86621752E-05 


3.71872105E-05 


75 


1.60122213E-05 


2.53068822E-05 


2.93652039E-05 


89 


2.86977094E-05 


5.52665702E-05 


7.47714207E-G5 



37 



Table 6 

Convergence of farfield error in the L2-norm as the size of the unit ceU is increased relative to the wavelength for both periodic 
and non-periodic ACE. Test consists of 1000 point sources randomly distributed over a plane for fi = 2 where |ai| = |a2| = 1, 
at a fixed number of harmonics (P=8). 



A 


Free Space 


2D Periodic 


2.00 


1.393139705E-005 


6.799602032E-006 


1.33 


1.581618372E-005 


7.475801890E-006 


1.00 


1.704968274E-005 


1.134334351E-006 


0.80 


2.490816540E-005 


4.231144671E-005 


0.67 


1.419596242E-004 


1.478215770E-004 


0.57 


6.760960910E-004 


2.974738916E-004 


0.50 


2.798975397E-003 


1.003281509E-003 


0.33 


2.845979667E-001 


1.469393475E-001 


0.25 


6.951138039E+000 


1.390723256E+001 



Table 7 

Scaling with the number of unknowns, N for Yukawa point sources randomly distributed over a plane with ^ = 1. A linear 

regression on a log-log scale indicates Iace ~ l\fi-029 ^^^ Nn^ar ~ ]\fi-0'iO^ 



N (Ni) 


tACE (sec) 


tpre (sec) 


^^unique 


tfar (sec) 


tdirect (sec) 


Nnear 


1024 (3) 


2.00E-02 


1.16E-00 


34 


1.46E-h01 


2.00E-02 


495728 


4096 (4) 


1.60E-01 


3.75E-00 


no 


3.70E+02 


6.50E-01 


2188068 


16384 (5) 


5.20E-01 


5.98E-00 


176 


-6.57E-h03 


-1.15E-I-01 


9277942 


65536 (6) 


2.01E-00 


8.22E-00 


242 


-1.08E-F05 


-1.90E-f02 


38303868 


262144 (7) 


7.97E-00 


l.OOE+01 


297 


-1.74E-F06 


-3.05E+03 


156415674 


1048576 (8) 


3.19E+01 


1.12E+01 


333 


-2.79E-f07 


-4.90E+04 


634053672 



Table 8 

Scaling with the number of unknowns, A'^, for Helmholtz point sources randomly distributed over a plane with ^ = 2. A linear 

regression on a log-log scale indicates i^C-B ~ JVi028 ^^^^ Nn^ar ~ ]\f^-007 ^ 



N{Ni) 


tACE (sec) 


tpre (sec) 


-^^unique 


tfar (sec) 


tdirect (sec) 


^^near 


1024 (3) 


2.00E-02 


1.50E-01 


24 


8.63E-h01 


2.00E-02 


590526 


4096 (4) 


1.70E-01 


8.90E-01 


168 


2.62E-h03 


6.40E-01 


2364674 


16384 (5) 


5.60E-01 


1.50E-00 


288 


-4.97E+04 


-1.15E+01 


9511212 


65536 (6) 


2.14E-00 


2.08E-00 


408 


-7.72E+05 


-1.89E+02 


38504020 


262144 (7) 


8.24E-00 


2.44E-00 


528 


-1.24E+07 


-3.04E+03 


156415674 


1048576 (8) 


3.23E+01 


2.60E-00 


592 


-1.99E+08 


-4.88E+04 


634053672 
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Table 9 

Scaling with the number of unknowns, A'^, for Coulomb point sources randomly distributed over a cube with /i = 3. A linear 

regression on a 1 



g-log scale indicates tj^cE ~ A''-'^""^'' and N„^ar ^ 


^ j^im2^ 






N (Ni) 


tACE (sec) 


^pre \SBCj 


^^unique 


tfar (sec) 


tdirect (sec) 


Nnear 


4096 (3) 


2.10E-01 


7.26E-00 


218 


2.25E+03 


6.40E-01 


7081842 


32768 (4) 


6.76E-00 


3.91E+01 


2290 


-2.36E+05 


-6.71E+02 


56904834 


262144 (5) 


2.49E+01 


5.32E+01 


3678 


-1.58E+07 


-4.51E+03 


458747022 


2097152 (6) 


1.72E+02 


6.62E+01 


5066 


-1.02E+09 


-2.90E+05 


3724629570 
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